2/4/2018

Why drug crime?

  • Community impact
  • Cooler than parking tickets
  • Easy access to ~31,000 observations

Goals of this talk

  • Show the spatial distribution of crime in the city
  • Get someone excited about R or open data
  • Provide a tidyverse template for analyzing raw police reports

Where is crime happening?

Most frequent addresses

Top 3 address for drug crime and not drug crime

arrange(crime_counts, -n) %>% group_by(drug_flag) %>% slice(1:3)
## # A tibble: 6 x 4
## # Groups:   drug_flag [2]
##                               address drug_flag     n        geometry
##                                 <chr>     <chr> <int> <S3: sfc_POINT>
## 1  600 E MARKET ST Charlottesville VA     drugs   410 <S3: sfc_POINT>
## 2   400 GARRETT ST Charlottesville VA     drugs    38 <S3: sfc_POINT>
## 3 700 PROSPECT AVE Charlottesville VA     drugs    38 <S3: sfc_POINT>
## 4  600 E MARKET ST Charlottesville VA not_drugs   635 <S3: sfc_POINT>
## 5 700 PROSPECT AVE Charlottesville VA not_drugs   412 <S3: sfc_POINT>
## 6   1100 5TH ST SW Charlottesville VA not_drugs   341 <S3: sfc_POINT>

The police station's address is 606 E Market Street….

What is going on at the police station?

"…when individuals walk in to the police department to file a report the physical address of the department (606 E Market Street) is often used in that initial report if no other known address is available at the time. This is especially true for incidents of found or lost property near the downtown mall where there is no true known incident location. The same is true for any warrant services that result in a police report occurring at the police department." - CPD response

what_police

Test if the proportions are different:

station_props <- arrange(crime_counts, -n) %>%
    group_by(drug_flag) %>%
    add_count(wt = n) %>%
    slice(1)

with(station_props, prop.test(n, nn))
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  n out of nn
## X-squared = 2135.5, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1810225 0.2196687
## sample estimates:
##     prop 1     prop 2 
## 0.22210184 0.02175626

Aggregate into areas

Census blocks make a lot of sense becuase:

  • Tons of data in Census and American Community Surveys
  • Reputable source with code books and APIs
  • Easy to access in R via ODP and library(tidycensus)

do_it

Doing it

census <- geojsonio::geojson_read("https://opendata.arcgis.com/datasets/e60c072dbb734454a849d21d3814cc5a_14.geojson",
                                  what = "sp") %>% sf::st_as_sf()

ggplot(census) +
    geom_sf(aes(fill = Population)) +
    scale_fill_viridis()